177 research outputs found
Using Deep Networks for Drone Detection
Drone detection is the problem of finding the smallest rectangle that
encloses the drone(s) in a video sequence. In this study, we propose a solution
using an end-to-end object detection model based on convolutional neural
networks. To solve the scarce data problem for training the network, we propose
an algorithm for creating an extensive artificial dataset by combining
background-subtracted real images. With this approach, we can achieve precision
and recall values both of which are high at the same time.Comment: To appear in International Workshop on Small-Drone Surveillance,
Detection and Counteraction Techniques organised within AVSS 201
Learning to Generate Unambiguous Spatial Referring Expressions for Real-World Environments
Referring to objects in a natural and unambiguous manner is crucial for
effective human-robot interaction. Previous research on learning-based
referring expressions has focused primarily on comprehension tasks, while
generating referring expressions is still mostly limited to rule-based methods.
In this work, we propose a two-stage approach that relies on deep learning for
estimating spatial relations to describe an object naturally and unambiguously
with a referring expression. We compare our method to the state of the art
algorithm in ambiguous environments (e.g., environments that include very
similar objects with similar relationships). We show that our method generates
referring expressions that people find to be more accurate (30% better)
and would prefer to use (32% more often).Comment: International Conference on Intelligent Robots and Systems (IROS
2019), Demo 1: Finding the described object (https://youtu.be/BE6-F6chW0w),
Demo 2: Referring to the pointed object (https://youtu.be/nmmv6JUpy8M),
Supplementary Video (https://youtu.be/sFjBa_MHS98
A Deep Incremental Boltzmann Machine for Modeling Context in Robots
Context is an essential capability for robots that are to be as adaptive as
possible in challenging environments. Although there are many context modeling
efforts, they assume a fixed structure and number of contexts. In this paper,
we propose an incremental deep model that extends Restricted Boltzmann
Machines. Our model gets one scene at a time, and gradually extends the
contextual model when necessary, either by adding a new context or a new
context layer to form a hierarchy. We show on a scene classification benchmark
that our method converges to a good estimate of the contexts of the scenes, and
performs better or on-par on several tasks compared to other incremental models
or non-incremental models.Comment: 6 pages, 5 figures, International Conference on Robotics and
Automation (ICRA 2018
COSMO: Contextualized Scene Modeling with Boltzmann Machines
Scene modeling is very crucial for robots that need to perceive, reason about
and manipulate the objects in their environments. In this paper, we adapt and
extend Boltzmann Machines (BMs) for contextualized scene modeling. Although
there are many models on the subject, ours is the first to bring together
objects, relations, and affordances in a highly-capable generative model. For
this end, we introduce a hybrid version of BMs where relations and affordances
are introduced with shared, tri-way connections into the model. Moreover, we
contribute a dataset for relation estimation and modeling studies. We evaluate
our method in comparison with several baselines on object estimation,
out-of-context object detection, relation estimation, and affordance estimation
tasks. Moreover, to illustrate the generative capability of the model, we show
several example scenes that the model is able to generate.Comment: 40 pages, 15 figures, 9 tables, accepted to the Robotics and
Autonomous Systems (RAS) special issue on Semantic Policy and Action
Representations for Autonomous Robots (SPAR
CINet: A Learning Based Approach to Incremental Context Modeling in Robots
There have been several attempts at modeling context in robots. However,
either these attempts assume a fixed number of contexts or use a rule-based
approach to determine when to increment the number of contexts. In this paper,
we pose the task of when to increment as a learning problem, which we solve
using a Recurrent Neural Network. We show that the network successfully (with
98\% testing accuracy) learns to predict when to increment, and demonstrate, in
a scene modeling problem (where the correct number of contexts is not known),
that the robot increments the number of contexts in an expected manner (i.e.,
the entropy of the system is reduced). We also present how the incremental
model can be used for various scene reasoning tasks.Comment: The first two authors have contributed equally, 6 pages, 8 figures,
International Conference on Intelligent Robots (IROS 2018
Spatio-Temporal Analysis of Facial Actions using Lifecycle-Aware Capsule Networks
Most state-of-the-art approaches for Facial Action Unit (AU) detection rely
upon evaluating facial expressions from static frames, encoding a snapshot of
heightened facial activity. In real-world interactions, however, facial
expressions are usually more subtle and evolve in a temporal manner requiring
AU detection models to learn spatial as well as temporal information. In this
paper, we focus on both spatial and spatio-temporal features encoding the
temporal evolution of facial AU activation. For this purpose, we propose the
Action Unit Lifecycle-Aware Capsule Network (AULA-Caps) that performs AU
detection using both frame and sequence-level features. While at the
frame-level the capsule layers of AULA-Caps learn spatial feature primitives to
determine AU activations, at the sequence-level, it learns temporal
dependencies between contiguous frames by focusing on relevant spatio-temporal
segments in the sequence. The learnt feature capsules are routed together such
that the model learns to selectively focus more on spatial or spatio-temporal
information depending upon the AU lifecycle. The proposed model is evaluated on
the commonly used BP4D and GFT benchmark datasets obtaining state-of-the-art
results on both the datasets.Comment: Updated Figure 6 and the Acknowledgements. Corrected typos. 11 pages,
6 figures, 3 table
MAGiC: A multimodal framework for analysing gaze in dyadic communication
The analysis of dynamic scenes has been a challenging domain in eye tracking research. This study presents a framework, named MAGiC, for analyzing gaze contact and gaze aversion in face-to-face communication. MAGiC provides an environment that is able to detect and track the conversation partner’s face automatically, overlay gaze data on top of the face video, and incorporate speech by means of speech-act annotation. Specifically, MAGiC integrates eye tracking data for gaze, audio data for speech segmentation, and video data for face tracking. MAGiC is an open source framework and its usage is demonstrated via publicly available video content and wiki pages. We explored the capabilities of MAGiC through a pilot study and showed that it facilitates the analysis of dynamic gaze data by reducing the annotation effort and the time spent for manual analysis of video data
THE EFFECT OF MOVEMENT AND PLAY-BASED MUSIC EDUCATION ON MUSICAL SKILLS OF STUDENTS AFFECTED BY MENTAL DISABILITY
The research aims to determine the effect of movement and game-based music education on the musical (exercising dynamics, playing the body, singing) skills of students with moderate intellectual disability. Within the framework of this purpose, it was aimed to improve the musical dynamics application skills, body playing skills and singing skills of students with special needs. In the study, the inter-behavioral multiple probe model, which is one of the single-subject experimental designs, was used. A student affected by moderate intellectual disability participated in the study. The findings showed that the effects of Movement and Play Based (MPBME) music education on the musical dynamics practice skills, body playing skills and singing skills of students with moderate intellectual disability were statistically significant and positive. He has shown that he has developed his skills and that he can demonstrate these skills with different applications and that his skills continue
- …